91 research outputs found
Efficient Pattern Matching on Binary Strings
The binary string matching problem consists in finding all the occurrences of
a pattern in a text where both strings are built on a binary alphabet. This is
an interesting problem in computer science, since binary data are omnipresent
in telecom and computer network applications. Moreover the problem finds
applications also in the field of image processing and in pattern matching on
compressed texts. Recently it has been shown that adaptations of classical
exact string matching algorithms are not very efficient on binary data. In this
paper we present two efficient algorithms for the problem adapted to completely
avoid any reference to bits allowing to process pattern and text byte by byte.
Experimental results show that the new algorithms outperform existing solutions
in most cases.Comment: 12 page
The Many Qualities of a New Directly Accessible Compression Scheme
We present a new variable-length computation-friendly encoding scheme, named
SFDC (Succinct Format with Direct aCcesibility), that supports direct and fast
accessibility to any element of the compressed sequence and achieves
compression ratios often higher than those offered by other solutions in the
literature. The SFDC scheme provides a flexible and simple representation
geared towards either practical efficiency or compression ratios, as required.
For a text of length over an alphabet of size and a fixed
parameter , the access time of the proposed encoding is proportional
to the length of the character's code-word, plus an expected
overhead, where
is the -th number of the Fibonacci sequence. In the overall it uses
bits, where is the length of the encoded string.
Experimental results show that the performance of our scheme is, in some
respects, comparable with the performance of DACs and Wavelet Tees, which are
among of the most efficient schemes. In addition our scheme is configured as a
\emph{computation-friendly compression} scheme, as it counts several features
that make it very effective in text processing tasks. In the string matching
problem, that we take as a case study, we experimentally prove that the new
scheme enables results that are up to 29 times faster than standard
string-matching techniques on plain texts.Comment: 33 page
On the bit-parallel simulation of the nondeterministic Aho-Corasick and suffix automata for a set of patterns
In this paper we present a method to simulate, using the bit-parallelism technique, the nondeterministic Aho-Corasick automaton and the nondeterministic suffix automaton induced by the trie and by the Directed Acyclic Word Graph for a set of patterns, respectively. When the prefix redundancy is nonnegligible, this method yields-if compared to the original bit-parallel encoding with no prefix factorization-a representation that requires smaller bit-vectors and, correspondingly, less words. In particular, if we restrict to single-word bit-vectors, more patterns can be packed into a word. We also present two simple algorithms, based on such a technique, for searching a set P of patterns in a text T of length n over an alphabet @S of size @s. Our algorithms, named Log-And and Backward-Log-And, require O(([email protected])@?m/[email protected]?)-space, and work in O([email protected]?m/[email protected]?) and O([email protected]?m/[email protected]?l"m"i"n) worst-case searching time, respectively, where w is the number of bits in a computer word, m is the number of states of the automaton, and l"m"i"n is the length of the shortest pattern in P
A compact representation of nondeterministic (suffix) automata for the bit-parallel approach
AbstractWe present a novel technique, suitable for bit-parallelism, for representing both the nondeterministic automaton and the nondeterministic suffix automaton of a given string in a more compact way. Our approach is based on a particular factorization of strings which on the average allows to pack in a machine word of w bits automata state configurations for strings of length greater than w. We adapted the Shift-And and BNDM algorithms using our encoding and compared them with the original algorithms. Experimental results show that the new variants are generally faster for long patterns
- …